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Information is often encoded as an aperiodic chain of building blocks. Modern digital computers use bits as the building 
blocks, but in general the choice of building blocks depends on the nature of the information to be encoded. What are the 
optimal building blocks to encode structural information? This can be analysed by substituting the operations of addition 
and multiplication of conventional arithmetic with translation and rotation.lt is argued that at the molecular level, the best 
component for encoding discretised structural information is carbon. Living organisms discovered this billions of years ago, 
and used carbon as the back-bone for constructing proteins that function according to their structure. Structural analysis 
of polypeptide chains shows that an efficient and versatile structural language of 20 building blocks is needed to implement 
all the tasks carried out by proteins. Properties of amino acids indicate that the present triplet genetic code was preceded 
by a more primitive one, coding for 10 amino acids using two nucleotide bases. 
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1. Structural information 

It is a characteristic of living organisms to acquire in- 
formation, interpret it and pass it on, often using it and 
refining it along the way. This information can be in 
various forms or languages. It can be genetic informa- 
tion passed on from the parent to the offspring, sensory 
information conveyed by the sense organ to the brain, 
linguistic information communicated by one being to an- 
other, or numerical data entered in a computer for later 
use. It is advantageous to process the information effi- 
ciently, and not in any haphazard manner. In case of 
living organisms, Darwinian selection during evolution 
can be considered the driving force for such optimisation. 
In general, information processing is optimised following 
two guidelines: minimisation of physical resources (time 
as well as space), and minimisation of errors. 

A striking feature of all the forms of information listed 
above is that the messages are represented as aperiodic 
chains of discrete building blocks. Such a representation, 
called digitisation of the message, is commonplace due to 
its many advantages. Discretisation makes it possible to 
correct errors arising from local disturbances, and so it is 
desirable even when the underlying physical variables arc 
continuous (e.g. voltages and currents in computers). It 
is also easier to handle several variables each spanning a 
small range than a single variable covering a large range. 
Any desired message can then be constructed by putting 
together as many as necessary of the smaller range vari- 
ables, while the instruction set required to manipulate 
each variable is substantially simplified. This simplifica- 



tion means that only a limited number of processes have 
to be physically implemented leading to high speed com- 
putation. An important question, therefore, is to figure 
out the best way of digitising a message, i.e. what should 
be selected as the building blocks of the aperiodic chain. 

The information contained in a message depends on 
the values and locations of the building blocks. Given a 
set of building blocks, Shannon quantified the informa- 
tion contained in a message as its entropy, i.e. a measure 
of the number of possible forms the message could have 
taken. This measure tells us that the information con- 
tent of a message can be increased by eliminating corre- 
lations from it and making it more random. It also tells 
us that local errors in a message can be corrected by 
building long range correlations into it. But it does not 
tell us what building blocks are appropriate for a partic- 
ular message. The choice of building blocks depends on 
the type of the information and not on the amount of 
information. 

Information can be translated from one language into 
another by replacing one set of building blocks used to 
encode the information by another, e.g. textual informa- 
tion is stored in the computer in a binary form using the 
ascii code. Nonetheless, physical principles are involved 
in selecting different building blocks for different infor- 
mation processing tasks. For example, our electronic 
computers compute using electrical signals but store the 
results on the disk using magnetic signals; the former re- 
alisation is suitable for quick processing while the latter 
is suitable for long term storage. In selection of building 
blocks with appropriate properties, the foremost practi- 



cal criterion is that it should be easy to distinguish one 
building block from another. This simple criterion allows 
control over errors, and is often sufficient for a heuris- 
tic understanding of the number of building blocks of 
our languages. We use decimal system of numbers be- 
cause we learnt to count with our fingers. The number 
of phonemes in our languages (i.e. vowels and conso- 
nants but not the tone) are determined by the number 
of distinct sounds our vocal chords can make. Comput- 
ers and nervous systems use binary code because off/on 
states can be quickly decided with electrical signals. Ge- 
netic information is encoded using four nucleotide bases, 
perhaps because the quantum assembly algorithm is the 
optimal choice for replication at the molecular scale (Pa- 
tel 2001). 

Numerical representation of information is one- 
dimensional and uses building blocks with an ordering 
amongst them (e.g. one is greater than zero). But these 
features may not be present in other types of informa- 
tion. For example, ordering is not required for letters 
of an alphabet, and representation of structural infor- 
mation requires higher dimensional building blocks. To 
find the building blocks most suitable to encode a partic- 
ular type of information, one has to closely inspect the 
relation between the type of information and the physi- 
cal properties and tasks associated with it. 

The information in the genes for the synthesis of pro- 
teins is a clear-cut example of structural information. 
Living organisms generally do not have access to de- 
sired biomolecules in a readymade form. They first break 
down the ingested food into small building blocks, and 
then assemble the pieces in a precise manner to synthe- 
sise the desired biomolecules. The hereditary DNA is a 
one-dimensional read-only-memory in this process; the 
original DNA strand remains unchanged while the in- 
formation contained in it is copied on the new strand 
that is assembled on top of it. The genes carry the 
blueprint of how to synthesisc proteins by joining to- 
gether their building blocks — the amino acids. The role 
of a protein in biochemical reactions is determined by 
its three-dimensional shape and size, and the precise ar- 
rangement of chemical groups at its reaction sites. This 
three-dimensional structural information of proteins is 
encoded as a one-dimensional chain of amino acids, with 
the interactions amongst the amino acids determining 
how the chain would bend and fold to produce protein 
structures. To investigate the details of this mechanism, 
it is natural to ask: what is the best way of encoding 
structural information? This is the question addressed 
in this work. 

Many well-known properties of proteins and their re- 
lation to structural information are summarised in the 
Appendix. This material is provided only for quick ref- 
erence, and those familiar with it can easily skip it. Ev- 
ery protein does not necessarily display all these proper- 
ties, and there is considerable variation in the behaviour 
of different proteins (e.g. between small and large pro- 



teins) . On the other hand, it has to be emphasised that 
an efficient and yet versatile langauge must be capable 
of incorporating all the desired features, even though 
specific instances of the language may not display every 
possible feature. The ideal language for proteins, there- 
fore, must have the capability to: (1) fold linear chains 
into three-dimensional structures and also unfold them, 
(2) form three-dimensional structures of different shapes 
and sizes, (3) include different chemical groups as part 
of the amino acids, and (4) show chiral behaviour. 

Any structural transformation of a rigid body can be 
described in terms of two basic operations, translations 
and rotations. The set of all rigid body translations and 
rotations forms the well-known Galilean group, which 
has been studied in detail by physicists. To construct 
the building blocks of structural information, we have 
to discretise this continuous group and yet maintain its 
features required to encode information. 

We can compare translations and rotations to the fun- 
damental operations of arithmetic — addition and mul- 
tiplication. While addition is nothing but translation 
along the real line, multiplication is quite different from 
rotation. Rotations in our three-dimensional space are 
not commutative and that is of crucial importance in rep- 
resenting structural information. (The group of three- 
dimensional rotations is SU(2), which can be repre- 
sented using Pauli matrices or quaternions.) The build- 
ing blocks of numerical information are elements of Z n , 
the group of integers modulo n, and the cyclic nature 
of this group represents the order amongst the building 
blocks. The building blocks of structural information 
need to have characteristics of rigid bodies, i.e. specific 
size and orientation in three-dimensional space. To find 
them we have to look for a finite non-commutative group. 
In addition, to address the question of protein structure, 
we should look for transformations that take place at the 
atomic scale. 

Translations are easily discretised, as uniformly spaced 
units along a polymer chain. The atomic structure of 
matter provides a natural unit for translation — the phys- 
ical size of the building blocks. Indeed, the amino acids 
making up proteins differ from each other in terms of 
their side chemical groups, while their components along 
the chain are identical. Any translation can be built up 
from the elementary operations of addition of a building 
block, deletion of a building block and exchange of two 
adjacent building blocks. 

Rotations are more complicated to discretise. A rea- 
sonable criterion is to demand, on the basis of symmetry, 
that the allowed states be all equivalent and equidistant 
from each other. The largest set of such states can then 
provide an approximate basis for the rotation group, and 
the following properties are quickly discovered: 
• In our three-dimensional world, the largest number of 
equivalent and equidistant states is four. They corre- 
spond to the corners of a regular tetrahedron. One can 
go from any one to any other with equal ease — just one 



step. 

• A tetrahedron is the smallest polyhedron. It is the 
simplest structure that can implement non-commutative 
features of three-dimensional rotations. (In general, the 
simplest unit for tiling a d— dimensional space is a sim- 
plex with (d + 1) vertices. It is often convenient to con- 
struct a d— dimensional space as a Cartesian product of d 
one-dimensional spaces, but the simplex is a much more 
flexible unit than a hypercube.) 

• To be able to specify the three-dimensional orientation 
unambiguously, the building blocks should have the ca- 
pability to include a chiral center. Tctrahedral geometry 
allows that. 

• If quantum dynamics is involved, then the states should 
also be mutually orthogonal, so that they form a basis 
for the Hilbert space. The tetrahedral quantum states 
are mutually orthogonal; they can be obtained from the 
lowest two spherical harmonics, I = and 1 = 1. (I = 0,1 
form the minimal basis set for specifying orientations in 
three-dimensional space. They are the smallest two rep- 
resentations of the group of proper rotations, and any 
other representation can be constructed from them by 
tensor products.) Using sp 3 — hybridisation of atomic or- 
bitals, these states can be denoted as: 
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The high symmetry of this unitary transformation (all 
elements equal, only signs differ) is related to the equiv- 
alence of the four states. 

• Four is also the largest number of states which can 
be uniquely identified by a single yes/no question in a 
quantum search algorithm (Grover 1997). 

2. Tetrahedral geometry 

The outstanding example of an clement with such 
states is carbon. Moreover, 

o Carbon has the capability to form aperiodic chains, 
where different side chemical groups hang on to a back- 
bone. This capability is a must for encoding information. 
Silicon also possesses the same tetrahedral states, and is 
much more abundant, but it preferentially forms peri- 
odic chains (i.e. regular crystals). 

o If the logic above is repeated in the case of two- 
dimensional rotations, it leads to three equivalent states 
located at the corners of an equilateral triangle. Car- 
bon has the capability to form these states as well, by 
sp 2 — hybridisation of its atomic orbitals. 
o Carbon is the most important structural element form- 
ing the back-bone of biomolecules. Darwinian selection 
in evolution can be expected to have picked the best 
building blocks out of the available resources. 

With all these pieces fitting together, let us look at 
the tetrahedral group in some detail. The tetrahedral 



group is isomorphic to the permutation group of four 
objects. It has 24 elements, which can be factored into a 
group of 12 proper rotations (or even permutations) and 
reflection (or parity). The 24 clement and 12 element 
groups are denoted as Td and T respectively. 

A regular tetrahedron can be formed by joining alter- 
nate corners of a cube. The centres of the tetrahedron 
and cube then coincide, and this embedding is conve- 
nient for three-dimensional structural analysis of a chain 
with tetrahedral angles. The 12 proper rotations are de- 
composed into the identity operation, rotations around 
3-fold axes and rotations around 2-fold axes. There are 
four 3-fold axes, each joining the centre of the tetrahe- 
dron with a vertex; +120° and —120° rotations around 
these axes belong to different equivalence classes. There 
are three 2-fold axes, each passing through the center 
of the tetrahedron and midpoints of its non-intersecting 
edges (equivalently passing through the centres of oppo- 
site faces of the embedding cube). 

For a carbon atom located at the centre of the tetra- 
hedron, rotations around 3-fold axes correspond to ro- 
tations around its bonds. These single bonds are easy 
to rotate and give rise to different conformations of or- 
ganic molecules. In a polypeptide chain, the orientations 
that can be achieved by rotations around the bonds of 
the C a atoms are described by the Ramachandran map. 
As shown in Fig.l, the rotation angles are not uniformly 
populated, but prefer to be in several discrete locations. 
As the stars in the plot show, discretising the angles in 
steps of 120° is not a bad starting point. 

The 2-fold rotation axes bisect the bond angles. If a 
double bond is viewed as a deformation in which two 
tetrahedral bonds are merged together, then the dou- 
ble bond lies along the 2-fold rotation axis. 180° rota- 
tion about this axis corresponds to a transition between 
"trans" and "cis" forms. Most of the peptide bonds have 
the "trans" configuration. But occasional transitions to 
the "cis" form do occur, and they are important for in- 
troducing sharp bends in the chain. 

The parity transformation flips chirality of a struc- 
ture, which is of special significance for many biological 
molecules. Chirality flip is an allowed quantum trans- 
formation, e.g. the NH 3 molecule flips back and forth 
between configurations where the nitrogen atom is above 
and below the plane of three hydrogen atoms. But chi- 
rality flip becomes more difficult as the molecular size in- 
creases, and all the amino acids used as building blocks 
of proteins are known to be L-type (except for achiral 
glycine). Thus reflections are more difficult to imple- 
ment than proper rotations, and can be ignored as far 
as the structural analysis of proteins is concerned. 

3. Packing three-dimensional information 

Multi-dimensional structural information can be en- 
coded in several different ways. The complete informa- 
tion can be expressed directly, as in holograms (3-dim) 
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Figure 1: The Ramachandran map for chiral L-type 
amino acids, displaying the permitted rotation angles 
for the C a bonds in polypeptide chains (Ramachandran 
1963). The angles <j> and ip are periodic. In the ap- 
proximation that embeds the polypeptide chain on a 
diamond lattice in the "trans" configuration, only nine 
discrete possibilities exist for the rotation angles. These 
are marked by stars on the same plot; they are uniformly 
separated by 120° steps. 

and movie projections (2-dim). Or it can be arranged as 
an ordered set of lower dimensional segments, as in CT- 
scan (stack of parallel planes covering a 3-dim object) 
and television monitors (set of lines covering a 2-dim 
picture). The choice depends on whether the physical 
means that convey the information are extended or lo- 
cal. When mechanisms exist to look at the whole object 
in one go (e.g. with a wide beam of light), the com- 
plete information can be addressed directly. When only 
one part of the object can be considered at a time (e.g. 
with a narrow beam of electrons), it is more convenient 
to arrange the information as a sequence of small seg- 
ments. When the building blocks themselves have to 
convey the information, the latter format is the obvi- 
ous choice; multi-dimensional arrays are stored as folded 
sequences in computers and proteins are assembled as 
folded polypeptide chains. 

It is possible to assemble arbitrary structures by repet- 
itive arrangement of a single and small enough building 
block. For example, a crystal can be carved into the 
desired shape, and it is sufficient to describe the details 
of the surface (and not the contents of the full volume) 
for that purpose. A crystal can also form rapidly, since 
it can grow from a seed in all directions. But the pre- 



ferred shape of the crystal remains that of the building 
block. To assemble arbitrary shapes using a crystalline 
arrangement, another agency is needed to tell the crystal 
surface to stop growing after it has reached the desired 
position as well as to put the reactive chemical groups at 
specific locations; the building blocks themselves cannot 
carry those instructions. Thus crystal growth is conve- 
nient for making regular patterns, but it is not a good 
choice for assembling irregular shapes. 

The highly non-trivial task of specifying an irregular 
structure can be more easily accomplished by an aperi- 
odic folded chain of building blocks. Then the building 
blocks themselves carry preferences for specific orienta- 
tions at each step. Although such a chain grows slowly 
it does not need help from an external agency to achieve 
its desired shape. This property is a must at the low- 
est level of information processing — the message has to 
carry its own interpretation in terms of its physical prop- 
erties; no other interpreter is available^]. With a chain 
that knows how to fold itself, the problem of specify- 
ing the three-dimensional structural detail is simplified 
to that of constructing the appropriate one-dimensional 
chain. It is far easier for an external agency to syn- 
thesise aperiodic one-dimensional chains than irregular 
three-dimensional structures. Proteins do not have reg- 
ular shapes; they need all their grooves and cavities (i.e. 
structural defects) for their function, and how they fold 
is decided by their building blocks joined in a polypep- 
tide chain. Another physical reason why proteins have 
to be polypeptide chains that can fold and unfold again 
is that many proteins have to cross membranes and cell 
walls to carry out their tasks. A bulky shape would re- 
quire a big hole in the barrier to be crossed, through 
which many other molecules could also leak. But pro- 
teins unfold to their chain form, slip through a small hole 
in the barrier, and then fold again to their native form. 

Even after picking a folded chain structure, more spec- 
ifications are needed to find the desired building blocks. 
The chain can be uniformly flexible like a piece of string, 
or it can be made of stiff segments alternating with flex- 
ible joints like a chain of metal rings. If all the seg- 
ments of a chain are flexible, then it has to be fully 
tied from all directions to be held in place. Otherwise 
the structure can crumple and collapse. Carbon forms 
many structures with fully saturated bonds, but a com- 
pletely tied three-dimensional form requires rather pre- 
cise folding and cannot accommodate aperiodic build- 
ing blocks easily. Moreover, for a chain to have the ca- 
pacity to fold and unfold again, the side bonds holding 
the folds in the chain must be weaker than the bonds 
along the back-bone. For example, diamond is the hard- 
est material, but it is a periodic structure and cannot 

In case of computers, compilers and operating systems provide 
the abstract interpretation for high level information processing. 
But at the lowest level of machine code, the interpretation is built 
into the design of the physical components, i.e. in their responses 
to applied voltages and currents. 



be folded and unfolded again easily. The polyethylene 
back-bone (i.e. (— Ci/ 2 — )„) can accommodate aperiodic 
side groups, but it is too flexible. Given that the side 
group interactions are necessarily weak, structural sta- 
bility can be enhanced by making the back-bone stiffcr, 
e.g. by replacing some of the single bonds with double 
bonds that cannot rotate. Polypeptide chains are of this 
type; they have weak side group interactions, and the 
increased stiffness of their non-rotatable peptide bonds 
helps in maintaining the shape of the protein. 

Fig. 2a shows a back-bone with alternating single and 
double bonds. This is the structure of polyacetylene, and 
an aperiodic chain can be constructed by replacing the 
side —H by other chemical groups (e.g. —CH3). The 
trouble with this structure is that the ir— electrons in- 
volved in double bonds prefer to lower their energy by 
spilling over into neighbouring bonds. This resonance 
phenomenon gives a double bond character to all the 
bonds (the actual bond properties are somewhere in be- 
tween a single and a double bond) , and makes the whole 
back-bone planar. A planar back-bone is no good for 
constructing three-dimensional structures. The double 
bonds can be shifted to the side groups to reduce the 
spill over of ir— electrons, as illustrated in Fig. 2b, but 
the resultant structure is still planar. The next possi- 
bility for the back-bone configuration, which allows stiff 
segments with flexible joints, is to alternate one double 
bond with two single bonds. This is the structure of 
polypeptide chains, as shown in Fig. 2c. The stiff C — N 
peptide bond is created by 7r— electrons spilling over from 
the C — O double bond; inclusion of nitrogen atoms in 
the chain ensures that the 7r— electrons spill over only on 
one side and not the other. The rotatable single bonds 
of C a atoms permit construction of three-dimensional 
structures. 
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Figure 2: Different possibilities for polymer chains with 
carbon back-bone: (a) (CH) n , (b) (CO) n , (c) polypep- 
tide chain. 



H+N 



H 

(a) 



H 2 C- 



(b) 



H 

~CH 2 
I 

CH 2 



coo- 



H+N— C— H 



R 

(c) 



Figure 3: Amino acid configurations (in their ionised 
forms): (a) glycine, (b) proline, (c) all the rest. 

The configurations of amino acids that polymerise to 
form the polypeptide chain are shown in Fig. 3. The pres- 
ence of acidic —COOH and basic —NH 2 groups in all 
amino acids provides a convenient way to join them in 
polypeptide chains by acid-base neutralisation. 

4. Elementary building blocks 

Having analysed the merits of a polypeptide back-bone 
structure, we now look at the three-dimensional geom- 
etry of a polypeptide chain, but with the simplifying 
assumptions that all the links in the chain arc of equal 
length and all the tetrahedral angles are of equal value 
(2tan- 1 (V2) w 109.5°). With these assumptions, the 
folded chain lies on a diamond lattice. Although the real 
peptide bond is planar with angles close to 120°, it can 
be fitted reasonably well on the diamond lattice in the 
"trans" configuration (see Fig. 4). The rare "cis" config- 
uration, takes the chain out of the diamond lattice. Let 
us first keep the "cis" configuration aside, and consider 
the chain in the "trans" configuration only. In a real 
polypeptide chain, variations from equal bond lengths 
and equal angles are within ±10%, and I will analyse 
the above described simplified version using the conven- 
tional polypeptide chain nomenclature. 

The diamond lattice is a face-centred cubic lattice with 
a two-point basis. Let this basis of lattice points be 
(0, 0, 0) and (1/4, 1/4, 1 /4) in units of the unit cell. Then 
the bond directions of the diamond lattice are (these are 
the last three columns of the matrix in Eq.(l)): 



ei = (+1/4, +1/4, +1/4) 

e 2 -(+1/4,-1/4,-1/4) 

e 3 =(-1/4, +1/4, -1/4) 

e 4 -(-1/4, -1/4, +1/4) 



(2) 



These directions refer to the lattice point at the ori- 
gin, and thereafter the bond directions at neighbouring 
points are opposite in sign. 

We can enumerate all possible configurations of the 
polypeptide chain, by specifying for every peptide bond 
the location of the next peptide bond in the chain. Let 
the reference peptide bond (C — N) be along ei from the 
origin. The chain prior to this peptide bond is already 
synthesised, so without loss of generality let the location 
of the C a preceding the reference peptide bond be e 2 . In 



the "trans" configuration, the N — C a and the C a — C 
bonds are parallel, so the location of the C a following the 
reference peptide bond is fixed as e\ — ei- (The sequence 
C a — C — N — C a fixes the plane of the peptide bond.) 

There are three possible locations for the next C: e\ — 
e2 + ei, ei — e2 + and e\ — e.2 + e^. From each of these 
three locations, the next peptide bond can proceed along 
three possible directions, excluding the already occupied 
C a — C direction. The C a ~ C direction and the next 
peptide bond direction fix the plane of the next peptide 
bond. Thus on a diamond lattice, given a peptide bond 
plane, there are 9 possible positions for the next peptide 
bond plane. 

These 9 orientations are all geometrically equivalent — 
they just reflect the 3-fold rotational symmetry around 
tetrahedral bonds of the C a atom. When C a — N bond 
is held in position, there are three equivalent choices of 
<fi, and when C a — C bond is held in position, there are 
three equivalent choices of 

For each of the polypeptide back-bone configuration 
described above, there are two remaining directions for 
other groups to attach to the C a atom. One direction 
is attached to the R-group of the amino acid, while the 
other to a hydrogen atom. There are two arrangements 
possible, and they correspond to opposite chirality. De- 
tailed model-building studies have shown that all the 
R-groups in a polypeptide chain must be of the same 
stereoisomer for the stability of regular secondary struc- 
tures (e.g. a— helices and f3— sheets). All the amino acids 
naturally occurring in proteins are L-type. Altogether, 
therefore, there remain 9 possible ways of adding a new 
L-type amino acid to an existing polypeptide chain. (In- 
stead of an R-group, glycine has two hydrogen atoms 
attached to the C a atom. That makes glycine achiral, 
but the number of attachment possibilities for the C a 
atom remains one.) 

The Ramachandran map shown in Fig.l is constructed 
using the bond lengths and angles in an actual polypep- 
tide chain. The nine points corresponding to the "trans" 
configuration discrete chain are marked as stars on the 
same plot. It is easily seen that the discrete approxi- 
mation is not too far off reality, even though it cannot 
describe all the details of the <f> — tp angular distribution. 
Actually, the plot in Fig.l does not include glycine and 
proline; their structural preferences are somewhat dif- 
ferent. The region around (<f) = 60°, ip — —60°) is not 
occupied in the Ramachandran map because of steric 
conflict between the side chain R-group and the atoms 
in the polypeptide back-bone. Glycine with no side chain 
does not have this conflict and can occupy this region — 
its Ramachandran map has inversion symmetry. In case 
of proline, the rigid imino ring does not allow the N — C a 
bond to rotate, and <j> is constrained to be around —60°. 
In case of a real polypeptide chain, embedding it on the 
diamond lattice will distort its shape; the extent of dis- 
tortion will then be a measure of usefulness of the discre- 
tised description. (The peptide bond is a little shorter 



than the single bonds and its bond angles of 120° are 
somewhat wider than the tetrahedral angle. These two 
deviations tend to compensate for each other to some 
extent.) 

Now we can look at the "cis" configuration of the pep- 
tide bond. It is obtained from the "trans" configura- 
tion by rotating the N — C a bond by 180° around the 
peptide bond axis (see Fig. 4). With the peptide bond 
along ei and the preceding C a — C bond along — e%, 
the "cis" configuration N — C a bond is along lex + &i- 
This orientation does not fit in the face-centred cubic 
diamond lattice, but it can be fitted in the hexagonal di- 
amond lattice?] with the hexagonal symmetry axis along 
e\. It is well-known that the three-dimensional closest 
packing of spheres can be viewed as a stack of two- 
dimensional layers. There are three possible positions 
for the layers, and each layer has to be displaced rela- 
tive to the ones on its either side. There are, therefore, 
two distinct ways to add a new layer onto an existing 
stack. The face-centred cubic lattice corresponds to the 
layer sequence . . . ABC ABC ABC . . ., the hexagonal lat- 
tice corresponds to the sequence . . . ABABAB . . ., and 
random sequences are also possible. An insertion of a 
"cis" peptide bond in an otherwise "trans" peptide chain 
corresponds to a flip in the layer sequence of the type 
. . . ABC ABC B AC B A .... This flip has no effect on the 
9 possibilities for the subsequent rotation angles 4> and 
ip, and further elongation of the polypeptide chain. Thus 
we can count the trans-cis transformation as one more 
elementary structural operation. 

The 10 operations described above exhaust the "ele- 
mentary logic gates" for the polypeptide chain embed- 
ded on a diamond lattice, i.e. by implementing these 10 
operations one can fold the polypeptide chain on a dia- 
mond lattice in any desired configuration. Long distance 
connections (disulfide and hydrogen bonds) are impor- 
tant for structural stability of the polypeptide chain, but 
they do not give rise to new configurational possibilities. 



O 

I: 

.c 



o 
I! 

.C- 



H 



•c; 



: N' 



H 



•c; 



: N' 



a 



(a) 



(b) 



Figure 4: Peptide bond configurations: (a) trans, (b) cis. 



2 Carbon can also form a hexagonal diamond lattice, with the 
same tetravalent bonds and density as the face-centred cubic dia- 
mond lattice. Such hexagonal diamond crystals do not occur ter- 
restrially, but they have been found in meteorites and have been 
synthesised in laboratory. 



5. Putting things together 

There is no clear association of any amino acid with 
the 9 discrete points in the Ramachandran map. Even 
the most rigid proline occurs in different orientations, 
and though only glycine can occupy the region around 
(4> = 60°, -0 = —60°) it occurs in other orientations 
too. Indeed, just the composition of a particular amino 
acid does not decide its configuration in the polypeptide 
chain; rather the overall interactions of its R-group with 
those that precede it and those that follow it fix the con- 
figuration. When the orientation of the middle amino 
acid depends on the amino acids that precede and follow 
it, the structural code is necessarily an overlapping one. 
Even with an overlapping structural code, every time an 
amino acid is added to the polypeptide chain, the orien- 
tation of one amino acid gets decided. So a maximally 
overlapping efficient code needs at least 9 amino acids 
(may be 10 to include trans-cis transformation) to con- 
struct a polypeptide chain of arbitrary configuration^. 
The R-group properties of amino acids have been stud- 
ied in detail: polar and non-polar, positive and negative 
charge, straight chains and rings, short and long chains, 
and so on. Still which sequence of amino acids will lead 
to which conformation of the chain is an exercise in cod- 
ing that has not been solved yet. 

At this stage, it is instructive to observe that the 20 
amino acids are divided into two classes of 10 each, ac- 
cording to the properties of their aminoacyl-tRNA syn- 
thetases (Eriani et al. 1990, Arnez and Moras 1997, 
Lewin 2000). The two classes of synthetases totally dif- 
fer from each other in their active sites and in how they 
attach amino acids to the tRNA molecules. The lack 
of any apparent relationship between the two classes of 
synthetases has led to the conjecture that the two classes 
evolved independently, and early form of life could have 
existed with proteins made up of only 10 amino acids of 
one type or the other. A closer inspection of the R-group 
properties of amino acids in the two classes reveals that 
each property (polar, non-polar, ring/ aromatic, positive 
and negative charge) is equally divided amongst the two 
classes, as shown in Table 1. Not only that, but the 
heavier amino acids with each property belong to class 
I, while the lighter ones belong to class II. This division 
of amino acids according to the length of their side chains 
has unambiguous structural significance. The diamond 
lattice structure is quite loosely packed with many cavi- 



3 Information is localised in individual building blocks in a 
strictly local code (e.g. our system of writing numbers), but it 
is not so in an overlapping code (e.g. pronunciation of vowels in 
English words often depends on the neighbouring letters) . As long 
as the total information content of the message remains the same, 
one can map the two codes into each other by changing variables. 
When both the codes are efficient, the total length of the message 
and the total information content remain the same, and the num- 
ber of building blocks required cannot change. One can reduce the 
number of building blocks at the expense of increasing the length 
of the message, but that would not make the best use of available 
resources. 



Amino acid 


R-group property 


Mol. wt. 


Class 


Gly (Glycine) 


Non-polar 


75 


II 


Ala (Alanine) 


Non-polar 


89 


II 


Pro (Proline) 


Non-polar 


115 


II 


Val (Valine) 


Non-polar 


117 


I 


Leu (Leucine) 


Non-polar 


131 


I 


He (Isoleucine) 


Non-polar 


131 


I 


Ser (Serine) 


Polar 


105 


II 


Thr (Threonine) 


Polar 


119 


II 


Asn (Asparagine) 


Polar 


132 


II 


Cys (Cysteine) 


Polar 


121 


I 


Met (Methionine) 


Polar 


149 


I 


Gin (Glutamine) 


Polar 


146 


I 


Asp (Aspartate) 


Negative charge 


133 


II 


Glu (Glutamate) 


Negative charge 


147 


I 


Lys (Lysine) 


Positive charge 


146 


II 


Arg (Arginine) 


Positive charge 


174 


I 


His (Histidine) 


Ring/ Aromatic 


155 


II 


Phe (Phenylalanine) 


Ring/ Aromatic 


165 


II 


Tyr (Tyrosine) 


Ring/ Aromatic 


181 


I 


Trp (Tryptophan) 


Ring/ Aromatic 


204 


I 



Table 1: Properties of the amino acids depend on their 
side chain R-groups. Larger molecular weights indicate 
longer side chains. The 20 amino acids naturally occur- 
ring in proteins have been divided into two classes of 10 
each, depending on the properties of aminoacyl-tRNA 
synthetases that bind the amino acids to tRNA. These 
classes divide amino acids with each R-group property 
equally, the longer side chains correspond to class I and 
the shorter ones correspond to class II. Some specific 
properties not explicit in the table are: asparagine is a 
shorter side chain version of glutamine, histidine has an 
R-group with a small positive charge but it is close to 
being neutral, and both the sulphur containing amino 
acids (cysteine and methionine) belong to class I. 



ties of different sizes. The use of long side chains to fill 
up big cavities and short side chains to fill up small ones 
can produce a dense compact structure, and proteins in- 
deed are tightly packed close to the maximum packing 
fraction. Thus we arrive at a structural explanation for 
the 20 amino acids as building blocks of proteins, a fac- 
tor of 10 for conformations of the polypeptide back-bone 
and a factor of 2 for the length of the R-group. (It can 
be noted that each class contains one special amino acid 
involved in tasks beyond the 9 folding possibilities on 
the diamond lattice. Proline is the special amino acid 
in class II, involved in trans-cis transformations, while 
cysteine is the special amino acid in class I, involved in 
tying together far separated regions of the polypeptide 
chain.) 

A look at the optimal solutions of the quantum search 
algorithm (Grover 1997) brings out another interesting 
feature. Living organisms form DNA and polypeptide 
chains by joining their building blocks together in se- 



quential assembly line operations. The templates for the 
assembly, hereditary DNA and mRNA, are preexisting 
objects in these processes. The building blocks are avail- 
able as a random collection, and the correct ones are se- 
lected by ensuring that they form appropriate molecular 
bonds with the templates. Molecular bonds are binary 
questions — they either form or do not form. The opti- 
misation criterion is to select the correct building block 
from the random collection by asking the minimum num- 
ber of questions. With binary questions, the best clas- 
sical search algorithm is a binary search, but the best 
quantum search algorithm is different. Identification of 
the nucleotide base-pairing with a binary quantum ques- 
tion provided two significant results for genetic informa- 
tion processing (Patel 2000) : the largest number of items 
that can be distinguished by one quantum question is 4, 
and by three quantum questions is 20.2. These numbers 
match the number of letters in DNA and protein alpha- 
bets, and the triplet code translating between them. The 
same algorithm also predicts that the largest number of 
items that can be distinguished by two quantum ques- 
tions is 10.5. (The non-integer number of items means 
that the algorithm has an intrinsic error. In the case of 
two queries and 10 items, the error rate is about 1 part 
in 1000.) A two nucleotide base code is thus optimal for 
distinguishing 10 amino acids. 

The experimentally observed wobble rules are consis- 
tent with the idea that an earlier genetic code used only 
two nucleotide bases of every codon and synthesised a 
smaller number of amino acids (Crick 1966). Another 
feature supporting this idea is that in the present ge- 
netic code similar codons code for amino acids with sim- 
ilar R-group properties. It is therefore possible that the 
third codon entered the present genetic code as a class 
label (classical and not quantum) , when two independent 
codes corresponding to long and short R-groups merged 
together during the course of evolution. Such symbio- 
sis would not be uncommon — there is evidence that the 
cellular organelles mitochondria and chloroplasts, with 
their own genetic material, first developed independently 
and were later incorporated in ancestral cells with eu- 
karyotic nuclei. 

Combining all these arguments, we can now construct 
a possible scenario of how the present genetic code arose 
from a more primitive one. Since the discovery of the 
genetic code, many attempts have been made to find 
its simpler predecessors (see for instance, Crick 1968, 
Kolaskar and Ramabrahmam 1982, Ikehara 2002). The 
present genetic code is too complex to have arisen in 
one go, and all the simpler predecessors use fewer nu- 
cleotide bases and fewer amino acids. The most impor- 
tant criterion in these attempts is that continuity has 
to be maintained in evolution — a drastic change will not 
permit the organism to survive. The changes therefore 
have to combine many small steps, each small change in 
the code providing a certain advantage in functionality. 
The scenario suggested by the preceding arguments is 



similar to what Crick proposed many years ago (Crick 
1968): 

(1) The primitive code was a triplet one due to some 
unidentified reasons. The first two letters coded for 10 
amino acids, while the third letter was a non-coding sep- 
aration mark. The individual genes were separate and 
not joined together, and so START and STOP signals 
were not needed. 

(2) This primitive code synthesised the simpler class II 
amino acids. The information about how the polypep- 
tide chain twists and turns at each step was incorporated 
in this code. The short side chains of class II amino acids, 
however, could not completely fill all the cavities in the 
three-dimensional protein structure. 

(3) The longer class I amino acids replaced the short 
ones of similar property at a later stage, wherever big 
cavities existed. This filling up of cavities increased the 
structural stability of proteins. 

(4) The third letter was put into use as a double-valued 
classical label for the amino acid class. That allowed 
coding for 20 amino acids. 

(5) Further optimisation of the code occurred with some 
juggling of codons, since 20 amino acids can be coded ei- 
ther by one classical and two quantum queries or by three 
quantum queries. Also, many genes joined together and 
START and STOP signals were inserted. 

(6) Similar codons for similar amino acids and the wob- 
ble rules are relics of the doubling of the genetic code, 
indicative of the past but no longer perfectly realised. 
In the absence of any knowledge of the doublet code, it 
is not possible to pin-point this scenario any further, and 
variations can be imagined. 

Further progress along this direction requires solutions 
of two puzzles. First, as already pointed out above, we 
need to identify which amino acid subsequence corre- 
sponds to which structural building block. There is no 
clear criterion regarding how long the amino acid subse- 
quence should be before it assumes a definite shape; may 
be interactions of an amino acid with two preceding ones 
and two following ones is a good enough beginning. The 
protein structure data accumulated in databases should 
help in such an analysis. Second, we need to guess the 
doublet code assignments from the known triplet code. 
This has already been studied to some extent within the 
context of the wobble rules, but it should be investigated 
in more detail keeping the constraints of the aminoacyl- 
tRNA synthetase classes in mind. 

6. Summary and outlook 

I have looked at the structure of proteins from an in- 
formation theory point of view. The emphasis is on the 
three-dimensional structure of the end-product, i.e. how 
should the segments of a polypeptide chain be chosen so 
that it folds into the required shape. The means used to 
achieve that end are secondary, i.e. which amino acids 
should be chosen so that the interactions amongst their 
R-groups make the polypeptide chain fold in the required 



manner. This emphasis is in sharp contrast with the con- 
ventional approach to the protein folding problem, i.e. 
find the three-dimensional structure of the protein, given 
the sequence of amino acids and the interactions of their 
R-groups. The conventional approach requires finding 
the lowest energy configuration of a polypeptide chain, 
and is believed to be NP-hard, because finding the global 
energy minimum with all possible interactions is not at 
all easy. The rephrased problem of structural design 
may not be that hard — the local orientation of a building 
block can be fixed by its interactions with its neighbours; 
it is enough to have a locally stable or metastable con- 
figuration and not necessarily a global energy minimum. 
(Diamond is structurally the strongest material, but it 
is energetically metastable.) Also, there are many ways 
a folded chain can cover a three-dimensional shape, and 
quite likely there is a lot of flexibility in choosing the se- 
quence of amino acids without substantially altering the 
structure of the protein. 

The fundamental operations needed for processing 
structural information are translation and rotation. I 
have shown that carbon and its tetrahedral geometry 
provide the simplest discretisation of these operations. 
For the construction of proteins as folded chains, the 
polypeptide chain is the simplest back-bone containing 
rigid segments alternating with flexible joints. To fold 
this back-bone into arbitrary shapes on a diamond lat- 
tice requires 10 basic operations. The amino acids some- 
how implement these operations by interactions amongst 
their side chain R-groups. 

I have pointed out that the division of the 20 amino 
acids, by aminoacyl-tRNA synthetases, into two classes 
of 10 each has structural significance. Every R-group 
property is equally divided between the two classes, such 
that the shorter side chains are in class II and the longer 
ones in class I. This is a new observation. Combining this 
fact with the number of discrete operations required to 
fold a polypeptide chain, and the result that two yes/no 
quantum queries can distinguish 10 items, I have pro- 
posed that the present triplet genetic code was preceded 
by a primitive doublet one. How the doublet code was 
converted to a triplet one is a matter of conjecture, and 
I have outlined one possible scenario. 

Knowing the solution selected by evolution has no 
doubt guided my logic. Still unraveling the optimisa- 
tion criteria involved in the design of molecules of life 
is a thrilling exercise. It should be kept in mind that 
evolution has discovered its optimal parameters, not by 
logical deduction, but by trial and error experiments (of 
course using the available means). For that reason the 
chosen parameters are not always perfect. On the other 
hand, evolution has had plenty of time for experimen- 
tation, something which we do not, and cannot, have. 
As a result, though evolution is not perfect in finding its 
criteria, it is impressive to say the least! 

A shortcoming of my analysis is that the role played 
by water in protein folding is totally ignored. Water 



molecules do not provide just a uniform background in- 
side the cells; they fit in tetrahedral geometry nicely, 
and are therefore well-suited to fill up empty grooves 
and cavities of proteins. Any explanation of interactions 
amongst the side chain R-groups of amino acids must 
include how the R-groups interact with water. This un- 
finished exercise is for the future. 

The ideas discussed in this work have potential ap- 
plications. Molecular dynamics simulations of protein 
folding have typically been carried out in the continu- 
ous three-dimensional space. The discrete information 
theory language can speed up these simulations, by first 
folding the polypeptide chain on a diamond lattice and 
then switching to the continuous space to fine tune the 
actual atomic positions. In recent years, lattice models 
have been used for the analysis of polypeptide folds. But 
they have mostly used cubic or triangular lattices and 
restricted amino acid properties to a binary hydropho- 
bic/hydrophilic choice (Hart and Istrail 1997, Agarwala 
et al. 1997). These models have had only a limited suc- 
cess in understanding the energy landscape of the protein 
folding process. My analysis suggests that the use of a 
diamond lattice and more details of amino acid proper- 
ties will bring the lattice models closer to reality. Of 
course, if the preferences of amino acid sequences for 
certain lattice positions are known (e.g. from the analy- 
sis of protein structure databases) , that can simplify the 
simulations further. 

From the view-point of nanotcchnology, construction 
of desired molecular structures by chains that know how 
to fold is a conceptual shift from the conventional ap- 
proaches that use external sources (either to carve the 
shapes or to assemble individual building blocks). To 
be able to do that, the folding code of amino acid se- 
quences must be deciphered; wide ranging applications 
are obvious. One can also consider the simpler prob- 
lem of constructing two dimensional patterns by folding 
chains made of flat building blocks. Such an approach 
will be a contrast to the standard techniques of lithog- 
raphy, but it requires first figuring out the appropriate 
two-dimensional building blocks and processes that can 
join them together in chains. Carbon rings and the ge- 
ometry of graphite sheets will no doubt play a central 
role as the optimal ingredients in such an exercise. 

In closing, I want to contrast different information pro- 
cessing paradigms. Electronic computation uses physical 
building blocks and operations based on real variables. 
Quantum computation extends the building blocks and 
operations to the complex numbers. Structural informa- 
tion processing goes still one step further, to the non- 
commutative algebra of quaternions. Systematic analy- 
sis of structural information processing has a long way 
to go. Yet in a sense, it came first — proteins arose 
before genes, nervous signals, spoken and written lan- 
guages, number systems and computers. After all, the 
word "protein" derives from the Greek "protos" meaning 
"first" or "foremost". 
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Appendix: Properties of proteins 

Many careful experiments have been performed over 
the years to study various properties of proteins. Here I 
list some of the important features they have revealed, 
as a quick reference (Lehninger et al. 1993, Creighton 
1992, Fersht 1999): 

• Proteins are synthesised by ribosomes by joining amino 
acids together as a linear chain. The linear chain grad- 
ually folds into the three-dimensional shape unique to 
every protein. The folding occurs by rigid body trans- 
formations of the bonds — deformations such as stretch- 
ing/shrinking/bending of bonds are insignificant. 

• Proteins carry out their tasks by binding to various 
molecules. The binding is highly specific, very much like 
a lock and key arrangement. Structurally stable features, 
precisely located on the protein surface, are necessary for 
this purpose. When a protein fails to fold into its proper 
shape due to some error, it cannot carry out its task and 
that gives rise to a disease. (Sickle-cell anaemia was the 
first genetic disease to be understood in this manner. A 
single mutation substitutes glutamic acid with valine at 
the sixth position in the amino acid sequence. The resul- 
tant defective haemoglobin does not fold correctly and 
is unable to carry out its function properly.) 

• The sequence of amino acids encodes the structural 
information of a protein. Fig. 3 shows the structure of 
individual amino acids. Proteins may include other im- 
portant components, c.g iron in haemoglobin, but the 
role of these other components is essentially chemical 
and not structural. 

• The sequence of amino acids is obtained by transla- 
tion from the sequence of nucleotide bases in DNA. This 
translation is necessary because the two languages serve 
two different purposes, and the purposes decide the phys- 
ical components for their realisations. According to the 
cell's need, proteins are synthesised, transported to ap- 
propriate locations to participate in biochemical reac- 
tions, and degraded at the end. The three-dimensional 
shape of the protein plays a critical part in its reactivity. 
The double helical structure of DNA, with nucleotide 
bases hidden inside, protects the one-dimensional infor- 
mation until it is required. DNA replication is also much 
less error-prone than protein synthesis. 

• The physical separation between consecutive amino 
acids in polypeptide chains and consecutive nucleotide 
bases in DNA is about the same, approximately 3.5A. 
It is not direct stereochemistry, therefore, which is re- 
sponsible for three nucleotide bases being mapped to one 
amino acid in the genetic code. The non-overlapping 
triplet code is likely to have arisen from the need to 



have a sufficient number of amino acids as the required 
building blocks for the three-dimensional protein struc- 
ture. Living organisms had to then set up the complex 
machinery, involving tRNA as adapters connecting nu- 
cleotide bases and amino acids, to carry out the task of 
translation. 

• Correct translation is ensured by the bilingual 
aminoacyl-tRNA synthetases that attach amino acids 
to tRNA molecules with appropriate anticodons. There 
may be several anticodons which map to a particular 
amino acid, but there is only one aminoacyl-tRNA syn- 
thetase per amino acid which carries out the many-to- 
one mapping. Once the tRNA molecules are properly 
charged with amino acids, the ribosomes match the an- 
ticodons of tRNA with codons of mRNA and construct 
the polypeptide chain. 

• Proteins often have to cross membranes and cell walls 
after their synthesis, since they often have to carry out 
their tasks at locations other than their place of synthe- 
sis. Membranes and cell walls cannot afford to have big 
holes (otherwise many molecules would leak), and that 
provides an important reason why proteins arc folded 
chains. During translocation, proteins unfold to their 
chain form, cross the barrier through a small hole and 
then fold again into their native three-dimensional form. 

• The three-dimensional protein structure specified by 
the sequence of amino acids is essentially unique. Small 
proteins fold on their own, but many large proteins re- 
quire help of molecular chaperons to fold. Globular pro- 
teins that fold on their own can be melted by heat, and 
they regain their native form upon cooling. 

• Carbon and nitrogen atoms, joined by strong covalent 
bonds, form the back-bone of the polypeptide chain. In 
this chain rigid peptide bonds alternate with rotatable 
bonds of C a atoms. The C — N peptide bonds have a 
double bond character; the nitrogen atom carries a pos- 
itive charge making its electronic behaviour similar to 
the tetravalent carbon atom. 

• Different amino acids are distinguished from each other 
by their R— groups, which are side chains attached to 
the C a atoms. Amino acid R— groups are of various 
types: polar, non-polar, aromatic, positively and nega- 
tively charged. The interactions of these R— groups with 
each other and with the ambient water molecules fix the 
orientations of the rotatable C a bonds. These interac- 
tions are weak, and easily influenced by the pH of the 
ambient liquid and the temperature. 

• Atoms in proteins are quite densely packed. In terms 
of the van der Waals atomic size, packing fraction for 
proteins is in the range 0.70 — 0.78, compared to 0.74 for 
closest packing of identical spheres. The packing fraction 
of a diamond lattice is only 0.34, and the side groups of 
a polypeptide chain folded along a diamond lattice fill 
up the empty spaces. Even then the packing density is 
high along the chain, while the amino acid side groups 
are somewhat loosely packed. Small cavities are filled up 
by water molecules, which fit into tetrahcdral geometry 



nicely. 

• The polypeptide chain is synthesised in the fully ex- 
tended form, corresponding to <f> = 180° = yj and "trans" 
configuration. Certain domains of proteins start folding 
as soon as they are synthesised, indicating that at least 
some of the folding rules are local. 

• The folding process occurs in stages. Local domains 
fold first, essentially due to weak bonds (hydrogen and 
van der Waals). This process is dominated by lo- 
cal transformations, i.e. proper rotations of bonds of 
C a atoms, and forms well-known structures such as 
a— helices and (3— sheets. In the next stage, already 
folded domains get linked by long-distance connections, 
e.g. disulfide bonds. In the final stage, various separately 
assembled structures, polypeptide chains and chemical 
groups, join together. 

• Regular structures like a— helices and (3— sheets are 
largely determined by the properties of the polypeptide 
back-bone, with a lot of freedom in choice of amino acid 
R— groups. It is the irregular twists and turns of the 
chain which critically depend on the interactions of the 
amino acid R— groups. Several different type of inter- 
actions exist (electric charges, dipoles, hydrogen bonds, 
rings, bifurcation and bulk of side chains etc.) to achieve 
all possible shapes. 

• In reality, proteins fold rather rapidly. The folded chain 
is a self-avoiding walk in three-dimensional space. Such 
a walk can get stuck for topological reasons, or it may 
need global criteria to complete its task (traveling sales- 
man type of problems are NP-hard with just local rules). 
An easy escape is to complete the task with multiple 
walks, i.e. start a new walk when the previous one gets 
stuck. Indeed many proteins are made of not a single 
p oly p q ptido chain, but oovoral polypeptide chaino ontan — 



utilise molecules of only L-type chirality. The smallest 
amino acid glycine is achiral, the next smallest alanine 
is once in a while found in D-type configuration, while 
the rest are always in L-type configuration. There ex- 
ist racemase enzymes which can flip chirality of amino 
acids. Most of the time they convert D-type amino acids 
to L-type for use in protein synthesis. After a cell dies, 
its molecules gradually revert to a mixture of D- and L- 
types. The proportion of D- and L-type molecules in a 
dead cell can indeed be used to figure out how long back 
the cell died. 
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